library(phenopix)
library(jpeg)
library(rgdal) # seems to be necessary for `extractVIs` function, but Kelsey hasn't figured out why yet
library(zoo)
library(tseries)
library(reshape2)
library(strucchange)
library(ggplot2)
library(cowplot)
# To run the code in the console, set the working directory. To knit, do NOT set the working directory as the root directory should be set directly in the knitr options (code chunk #1)
setwd("/Users/elwoodk/Google_Drive/courses/earth-analytics/final-project/phenocam/RGB/")
This tutorial is built primarily based on the Vignette for phenopix available with the code: vignette("base", package = "phenopix")
Structuring a folder tree useful for the analysis
Giving a good structure to your analysis can make all subsequent steps simple and straightforward. If you are running a site that records images you will be dealing with quite heavy folders (with likely multiple years of data, hence some thousand files of images) that you need to handle with care. We suggest separate folders for each site (of course) but also year of analysis. Each year folder should contain a sub-folder with all images to be processed (/IMG), one folder containing the reference image, i.e. the image you will use to draw your ROI (/REF), one folder containing data for the region of interest (/ROI) and one folder containing extracted vegetation indexes (/VI). The function structureFolder() provides a facility to create appropriate sub-folders. To use, set the working directory to be the folder that is within the site, within the year.
In the example below, we are only using one year of data, but have 2 types of images: RGB and RGB + IR. To start, let’s just use the RGB images, so we will set the working directory to the phenocam/RGB folder. In this folder, the file structure is as described above, with four subfolders: /IMG, /REF, /ROI, and /VI. This data is from the “harvardlph” PhenoCam data in Harvard Forest, MA from April 1, 2016 to December 1, 2016 with 30-minute daily capture from 10:00 AM to 2:00 PM. The data is publicly available from https://phenocam.sr.unh.edu/. I have set the working directory to be the RGB folder.
# Use the code below to create the sub-folders recommended by the `phenopix` package
# You only need to run this code once
my.path <- structureFolder(path = getwd(), showWarnings = FALSE)
str(my.path)
Drawing a region of interest (ROI)
Apart from structuring folders, drawing an ROI is the first, hence most important step of the analysis. The procedure is based on two steps: first, a reference image (chosen by the user) is plotted by calling function readJPEG() from package jpeg and rasterImage(). In Fig. 1 is the reference image from the harvardlph site and the code used to plot the image. We first define an easy plotting function to print on screen images.
.plotImage <- function(image, ...) {
ncols <- ncol(image)
nrows <- nrow(image)
suppressWarnings(plot(0,
type = "n", xlim = c(0, ncols), ylim = c(0, nrows), ...))
suppressWarnings(rasterImage(image,
xleft = 0,
ybottom = 0,
xright = ncols,
ytop = nrows, ...))
}
img <-jpeg::readJPEG("REF/harvardlph_2016_07_15_120007.jpg") # this is the only image in the REF folder. It was chosen and put into the `/REF` folder manually using the finder.
.plotImage(img)
Figure 1: A jpeg image printed on a graphic device using readJPEG() and rasterImage() embedded in the .plotImage() function
To draw the ROI, use the following code:
DrawROI(path_img_ref = "REF/harvardlph_2016_07_15_120007.jpg", # the folder of your reference image
path_ROIs = "ROI/", # the path in your computer where to store RData with ROI properties
nroi = 1, # number of ROIs (for two ROIs, you can use concatenate)
roi.names = c("canopy"), # list of ROI names (in order)
file.type = ".jpg")
A call to the function opens a graphic device and allows the use of locator() to define your ROI(s). Note that the use of locator is somewhat system specific. Check out the help file ?locator for more details. Locator allows the user to draw a polygon by left-clicking vertices and then right-clicking (or press ESC on MacOS) to close the polygon. If you have chosen more than one ROI, after closing your first polygon, the image will appear again unmodified to draw the second ROI, and so on. Note that the plot title includes the name of the ROIs you are drawing. When you are done, in your ROIs folder an .RData file called roi.data.RData and a .jpg file of each ROI will be stored. The RData file is actually a list with the following structure:
load('ROI/roi.data.Rdata')
str(roi.data)
## List of 1
## $ canopy:List of 2
## ..$ pixels.in.roi:'data.frame': 1244160 obs. of 3 variables:
## .. ..$ rowpos: num [1:1244160] 0.000772 0.001543 0.002315 0.003086 0.003858 ...
## .. ..$ colpos: num [1:1244160] 0.000772 0.000772 0.000772 0.000772 0.000772 ...
## .. ..$ pip : int [1:1244160] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ vertices :List of 2
## .. ..$ x: num [1:6] 0.252 0.717 0.992 0.683 0.325 ...
## .. ..$ y: num [1:6] 0.0452 0.0521 0.5416 0.6101 0.6015 ...
There are 2 elements list for each ROI. Each element is again a list containing two elements. One is a data.frame containing coordinates of all image pixels, together with a code indicating whether the given pixel belongs to the ROI or not. The second is a list with the coordinates of ROI margins as in output from locator().
In the ROIs folder, separate jpeg files for each of your regions of interest are stored. A call to the function printROI() allows to plot in the same graph all existing ROIs for a picture. In the example from Harvard, only one ROI was drawn. Here is the code to generate the plot in fig. 2:
# Using arguments in the vignette
PrintROI(path_img_ref = 'REF/harvardlph_2016_07_15_120007.jpg',
path_ROIs = 'ROI/',
which = 'all',
col = palette())
Figure 2: A plot of the regions of interest (ROIs), in output from PrintROI()
NOTE: When you draw an ROI on your best quality image (say 640 x 428 pixels) you will probably need to identify the same ROI in smaller size images. This will be the case, for example, if you want to conduct a pixel-based analysis, illustrated later on. Pixel-based analysis is computationally intense and therefore it is suggested to run it on rather small size images. The function updateROI() allows to recalculate pixels falling within a given ROI in images of different size compared to the one where the ROI was first drawn.
Usage is…
# img2 <-jpeg::readJPEG("REF/harvardlph_2016_07_15_120007.jpg")
# updateROI(roi.data, img2)
… where old.roi is the original roi.data object, new.img is the re-sized image. A new object with same structure as the original roi.data is returned.
*** This is where I have finished reviewing/editing to be relevant to the harvard data set
Filter out data
Data retrieved from images often need robust methods for polishing the time series. Bad weather conditions, low illumination, dirty lenses are among the most common issues that determine noise in the time series of vegetation indexes. Accordingly, phenopix has a function autoFilter() based on 4 different approaches. There are multiple filters to use: “night”, “blue”, “mad”, “max”, and “spline”.
Filters are applied in the order listed in the argument. Night filter removes records under a certain gcc value (as specified in filter.options). The default is 0.2. Blue filter is intended to remove bad images and is very aggressive. It is suggested only for very low quality images. The daily mean and standard deviation on bcc is computed and a sd threshold is computed as the quantile of standard deviations with prob = 0.05. An envelope is then computed as daily mean bcc +/- the calculated threshold. Raw data outside this envelope are discarded. The mad filter is applied according to Papale et al 2006 (biogeosciences) created to remove spikes on FLUXNET data. The max filter is based on Sonnentag et al (2012) and computes the 90% of the curve based on a three days moving window. The spline filter is based on Migliavacca et al (2011).
The function is designed to receive in input a data.frame structured as in output from extractVIs, hence its default expression may appear rather complicated.
Note that the function autoFilter is unsuccessful when there are duplicate rows. To avoid this problem, Kelsey (with the help of Caitlin White) added the function unique() before the data.frame to avoid problems.
VI.data.veg <- as.data.frame(VI.data$canopy)
VI.data.veg2 <- unique(VI.data$canopy)
filtered.data <- autoFilter(data = unique(VI.data$canopy),
dn=c('ri.av', 'gi.av', 'bi.av'),
brt = 'bri.av',
filter = c("night", "spline", "max"),
na.fill = TRUE)
Figure 5: Raw and filtered relative greenness index, default plot of function autoFilter()
str(filtered.data)
## 'zoo' series from 2016-04-01 to 2016-12-01
## Data: num [1:243, 1:7] 0.408 0.429 0.395 0.351 0.407 ...
## - attr(*, "dimnames")=List of 2
## ..$ : NULL
## ..$ : chr [1:7] "rcc" "gcc" "bcc" "brt" ...
## Index: POSIXct[1:243], format: "2016-04-01" "2016-04-02" "2016-04-03" "2016-04-04" "2016-04-05" ...
str(VI.data)
## List of 1
## $ canopy:'data.frame': 2177 obs. of 18 variables:
## ..$ date : POSIXct[1:2177], format: "2016-04-01 10:00:00" ...
## ..$ doy : num [1:2177] 92 92 92 92 92 92 92 92 92 93 ...
## ..$ r.av : num [1:2177] 123 123 122 123 121 ...
## ..$ g.av : num [1:2177] 97.3 99.5 98.2 102.9 97 ...
## ..$ b.av : num [1:2177] 82 85 84.1 90.8 82 ...
## ..$ r.sd : num [1:2177] 21.9 23.9 22.7 27.2 22.7 ...
## ..$ g.sd : num [1:2177] 20.8 24.6 23.8 30.3 22.8 ...
## ..$ b.sd : num [1:2177] 22.4 26.9 26.5 33.9 25 ...
## ..$ bri.av: num [1:2177] 302 308 304 317 300 ...
## ..$ bri.sd: num [1:2177] 62.7 73.3 71.2 90 68.3 ...
## ..$ gi.av : num [1:2177] 0.322 0.323 0.322 0.324 0.323 ...
## ..$ gi.sd : num [1:2177] 0.01135 0.01072 0.01024 0.00943 0.01115 ...
## ..$ gei.av: num [1:2177] -10.17 -9.36 -9.41 -8.5 -8.85 ...
## ..$ gei.sd: num [1:2177] 8.79 8.25 7.73 7.15 8.37 ...
## ..$ ri.av : num [1:2177] 0.411 0.406 0.406 0.397 0.408 ...
## ..$ ri.sd : num [1:2177] 0.0331 0.0351 0.0356 0.0361 0.0353 ...
## ..$ bi.av : num [1:2177] 0.268 0.271 0.272 0.28 0.269 ...
## ..$ bi.sd : num [1:2177] 0.0299 0.0314 0.0317 0.0321 0.0317 ...
plot(VI.data.veg)

In the structure of the output data.frame of filtered.data there are three important points:
- We introduce here a new class of R objects (zoo). From here on all further analyses are based on zoo (or, to a lesser extent ts) time series. The time index of the data is numeric day of year (doy). As a consequence, the attribute year is lost at this step of the analysis (we suggest to include it in the object name)
- The function autoFilter aggregates the data at a daily time step by default. The returned data.frame contains unfiltered (but still daily aggregated) color indexes (here called gcc, rcc, and bcc, cc standing for chromatic coordinate) and a column of data for each filtering step. The name of the filter applied is reported in the column name.
- The argument na.fill defaults to TRUE, meaning that NA already existing in the VI.data (unlikely) or data discarded by the filtering procedure (much more likely) are filled by linear approximation (using na.approx from zoo pack- age. This is done because the subsequent fitting step requires no NA appearing in the time series. If a user wants to have control on the discarded data and e.g. customize the gap-filling we recommend setting na.fill to FALSE. For those unfamiliar with the zoo structure we created a function convert to convert from zoo to a normal data.frame:
dataframed <- convert(filtered.data, year='2017')
However, we strongly recommend to get familiar with the zoo package since it has wonderful facilities for plotting, aggregating and filling time series. Filters are based on methods relying on different parameters that can be tuned by the user (called filter options). A function allows to return default filter options that can be in turn changed.
my.options <- get.options()
names(my.options) # a named list, one element for each filter
## [1] "night.filter" "blue.filter" "mad.filter" "max.filter"
## [5] "spline.filter"
## see help file for the meaning
my.options$max.filter$qt <- 0.95 # use 95th percentile instead of 90th for max.filter
filtered.data2 <- autoFilter(unique(VI.data$canopy), filter.options=my.options, plot=FALSE)
plot(filtered.data$max.filtered) ## default options
lines(filtered.data2$max.filtered, col='red') # customized options
legend('topleft', col=palette()[1:2], lty=1, legend=c('90th', '95th'), bty='n')
Figure 6: Effect (not that large indeed) of changing filter options with function autoFilter()
Fit a curve to the data
The seasonal trajectory of greenness index of a vegetation canopy provides per se important information, but to turn qualitative information into quantitative data we need to make some more computation. Traditionally, data similar to these (e.g. satellite-based NDVI trajectories) are processed in two main ways:
extract time thresholds based on a percentage of development (e.g. the day when half of the maximum value of the index is reached);
fit a curve and extract relevant thresholds based on curve properties.
In the package phenopix both possibilities are available. The core function for data fitting and phenophase extraction is greenProcess(). This function calls and is related to several rather internal functions that perform the different fittings. Available fittings include:
the fit of a cubic spline
the fit of an equation proposed by Beck et al. (2006)
the fit of an equation proposed by Elmore et al. (2012)
the fit of an equation proposed by Klosterman et al. (2014) with two implementations
the fit of an equation proposed by Gu et al. (2009)
All fits are based on a double - logistic function with a different number of parameters.
After curve fitting, relevant dates in the seasonal trajectory (aka phenophases) are extracted with different methods:
- A method called trs which splits the seasonal course into increasing and decreasing trajectory based on the sign of the first derivative and then identifies a given threshold (by default the 50%) of both the increasing and decreasing trajectory. It allows to determine start of season (sos), end of season (eos) and length of season (los) as the difference between the two.
A method called derivatives which extends trs in that it also calculates maximum growing and decreasing rates
A method based on Klosterman approach which individuates 4 moments in the seasonal trajectory. Greenup represents the beginning of growth, maturity represents the reaching of some summer plateau, senescence represents the beginning of green decrease (or yellowing increase) and dormancy represents the end of the growing season.
A method based on Gu approach which individuates 4 moments and some other curve parameters. The 4 relevant moments do not differ in their meaning compared to Klosterman phases, and are called upturn date (UD), stabilization date (SD), downturn date (DD), and recession date (RD).
Detail on curve fitting and phenophase extraction is provided in the help function of ?greenProcess as well as in the help files of other more internal functions such as ?KlostermanFit, ?GuFit, ?PhenoExtract. In fig.6 we show 4 different fitting methods applied to the same data (Harvard Forest). But let’s first have a look at the arguments of greenProcess:
ts is the zoo time series in input. It must be a time series with no NA. Arguments fit and threshold allows to choose the fitting and phenopahse methods, respectively. plot is a logical determining if a plot is returned or not, which is pertinent only if fit = ‘klosterman’, uncert is a logical for uncertainty computation, for which number of replicates is controlled by nrep. envelope and quantiles will be detailed later. hydro is a logical indicating whether days must be converted to hydrodays before the analysis, where october 1st will be doy 1 and so on (designed for southern hemisphere or for winter-growing plants). Since phenopix version > 2.0 the uncertainty estimation benefits from parallelization, for which arguments ncores controls the number of cores used in parallel computation, default is ‘all’ and the actual number of cores you want to use can be set with an integer. Parallelization is performed by calling function foreach in the foreach package.
## spline curve + trs phenophases
fit1 <- greenProcess(ts = filtered.data$max.filtered,
fit = 'spline',
threshold = 'trs',
plot=FALSE)
summary(fit1)
##
## Data
## Index observed
## Min. : 92.0 Min. :0.2946
## 1st Qu.:152.5 1st Qu.:0.3229
## Median :213.0 Median :0.3879
## Mean :213.5 Mean :0.3687
## 3rd Qu.:274.5 3rd Qu.:0.3977
## Max. :336.0 Max. :0.4258
##
## Predicted
## Index predicted
## Min. : 92.0 Min. :0.3095
## 1st Qu.:152.5 1st Qu.:0.3233
## Median :213.0 Median :0.3898
## Mean :213.5 Mean :0.3687
## 3rd Qu.:274.5 3rd Qu.:0.3976
## Max. :336.0 Max. :0.4174
##
## Formula
## NULL
##
## Thresholds
## sos eos los pop mgs rsp
## 138.0000000 286.0000000 148.0000000 160.0000000 0.3964404 NA
## rau peak msp mau
## NA 0.4173794 0.3664767 0.3634738
## Beck fitting + derivatives
fit2 <- greenProcess(ts = filtered.data$max.filtered,
fit = 'beck',
threshold = 'derivatives',
plot=FALSE)
## klosterman fitting + klosterman phenophases
fit3a <- greenProcess(ts = filtered.data$max.filtered,
fit = 'klosterman',
threshold = 'klosterman',
uncert = FALSE, # Increases computation time
nrep = 100, # For this data set, nrep = 100 takes about 8 minutes
which = "heavy", # "This argument performs an optimization similar in concept to the one performed in FitDoubleLogKlLight but with additional recursive optimization which is about 10 times more time consuming but allows for a better representation of the data. It it suggested to fit the light version of the equation and if the fit is not good enought, check this function out."
plot=FALSE)
summary(fit3a)
##
## Data
## Index observed
## Min. : 92.0 Min. :0.2946
## 1st Qu.:152.5 1st Qu.:0.3229
## Median :213.0 Median :0.3879
## Mean :213.5 Mean :0.3687
## 3rd Qu.:274.5 3rd Qu.:0.3977
## Max. :336.0 Max. :0.4258
##
## Predicted
## Index predicted
## Min. : 92.0 Min. :0.2971
## 1st Qu.:152.5 1st Qu.:0.3292
## Median :213.0 Median :0.3806
## Mean :213.5 Mean :0.3671
## 3rd Qu.:274.5 3rd Qu.:0.4031
## Max. :336.0 Max. :0.4101
##
## Formula
## expression((a1 * t + b1) + (a2 * t^2 + b2 * t + c) * (1/(1 +
## q1 * exp(-B1 * (t - m1)))^v1 - 1/(1 + q2 * exp(-B2 * (t -
## m2)))^v2))
##
## Thresholds
## Greenup Maturity Senescence Dormancy
## 111 158 253 336
## gu fitting and phenophases
fit4 <- greenProcess(ts = filtered.data$max.filtered,
fit = 'gu',
threshold = 'gu',
plot=FALSE)
summary(fit4)
##
## Data
## Index observed
## Min. : 92.0 Min. :0.2946
## 1st Qu.:152.5 1st Qu.:0.3229
## Median :213.0 Median :0.3879
## Mean :213.5 Mean :0.3687
## 3rd Qu.:274.5 3rd Qu.:0.3977
## Max. :336.0 Max. :0.4258
##
## Predicted
## Index predicted
## Min. : 92.0 Min. :0.3141
## 1st Qu.:152.5 1st Qu.:0.3231
## Median :213.0 Median :0.3888
## Mean :213.5 Mean :0.3687
## 3rd Qu.:274.5 3rd Qu.:0.4033
## Max. :336.0 Max. :0.4046
##
## Formula
## expression(y0 + (a1/(1 + exp(-(t - t01)/b1))^c1) - (a2/(1 + exp(-(t -
## t02)/b2))^c2))
##
## Thresholds
## UD SD DD RD maxline
## 131.99385793 143.09621465 280.49413430 304.18645354 0.87663946
## baseline prr psr plateau.slope
## -0.02100819 0.08085199 -0.03237116 -0.00120529
## elmore
## gu fitting and phenophases
fit5 <- greenProcess(ts = filtered.data$max.filtered,
fit = 'elmore',
threshold = 'klosterman',
plot=FALSE)
summary(fit5)
##
## Data
## Index observed
## Min. : 92.0 Min. :0.2946
## 1st Qu.:152.5 1st Qu.:0.3229
## Median :213.0 Median :0.3879
## Mean :213.5 Mean :0.3687
## 3rd Qu.:274.5 3rd Qu.:0.3977
## Max. :336.0 Max. :0.4258
##
## Predicted
## Index predicted
## Min. : 92.0 Min. :0.3218
## 1st Qu.:152.5 1st Qu.:0.3218
## Median :213.0 Median :0.3856
## Mean :213.5 Mean :0.3687
## 3rd Qu.:274.5 3rd Qu.:0.4009
## Max. :336.0 Max. :0.4151
##
## Formula
## expression(m1 + (m2 - m7 * t) * ((1/(1 + exp(((m3/m4) - t)/(1/m4)))) -
## (1/(1 + exp(((m5/m6) - t)/(1/m6))))))
##
## Thresholds
## Greenup Maturity Senescence Dormancy
## 132 149 284 301
par(mfrow = c(3,2))
plot(fit1,
type='p',
pch=20,
col='grey',
xlab = "DOY")
summary(fit2)
##
## Data
## Index observed
## Min. : 92.0 Min. :0.2946
## 1st Qu.:152.5 1st Qu.:0.3229
## Median :213.0 Median :0.3879
## Mean :213.5 Mean :0.3687
## 3rd Qu.:274.5 3rd Qu.:0.3977
## Max. :336.0 Max. :0.4258
##
## Predicted
## Index predicted
## Min. : 92.0 Min. :0.3221
## 1st Qu.:152.5 1st Qu.:0.3244
## Median :213.0 Median :0.4056
## Mean :213.5 Mean :0.3760
## 3rd Qu.:274.5 3rd Qu.:0.4097
## Max. :336.0 Max. :0.4097
##
## Formula
## expression(mn + (mx - mn) * (1/(1 + exp(-rsp * (t - sos))) +
## 1/(1 + exp(rau * (t - eos)))))
##
## Thresholds
## sos eos los pop mgs
## 138.000000000 289.000000000 151.000000000 185.000000000 0.405554965
## rsp rau peak msp mau
## 0.007890304 -0.003397248 0.409702205 0.366088410 0.364331237
plot(fit2,
type='p',
pch=20,
col='grey')
plot(fit3a,
type = 'p',
pch = 20,
col = 'grey')
plot(fit4,
type='p',
pch=20,
col='grey')
plot(fit5,
type='p',
pch=20,
col='grey')
Figure 6: Comparison of curve fittings and phenophase calculation methods (aka “thresholds”.)
## show all together
par(mfrow = c(1,1))
t <- as.numeric(format(index(filtered.data$max.filtered), '%j'))
par(lwd=3)
plot(t, dataframed$max.filtered, type='p', pch=20, ylab='Green chromatic coordinate', xlab='DOYs')
lines(fitted(fit1), col='blue')
lines(fitted(fit2), col='red')
lines(fitted(fit3a), col='green')
lines(fitted(fit4), col='violet')
legend('topleft', col=c('blue', 'red', 'green', 'violet'),
lty=1, legend=c('Spline', 'Beck', 'Klosterman', 'Gu'), bty='n')
Figure 7: Comparison of 4 different fittings from phenopix package
The function greenProcess creates an object of class phenopix with dedicated methods. The summary function displays a summary of the input data and of the predicted points. It then reports the formula of the fitting equation, if pertinent, see e.g. summary of fit1 which is not based on an equation.
Phenophases are printed as well. Note also the fitted function applied to phenopix object that returns a zoo time series of fitted values that can be directly lined to the plot.
To complete the overview on display generic methods applied to the objects of class phenopix here is the application of generic plot (fig.8) and print functions:
The plot function (below) returns a plot similar to the one constructed above, except that extracted phenophases are also shown the as vertical colored lines.
plot(fit4,
pch=20,
col='grey',
type='p',
xlab='DOYs',
ylab='Green chromatic coordinates')
Figure 8: Generic plot function applied to phenopix objects
The print function (below) returns information similar to summary but it also reports which fitting and phenophase methods were used, and if the uncertainty was estimated.
Fig.5 shows that different fitting equation lead to very similar fitted values on the example from (Harvard Forest) data. For the sake of robustness, in such situation it is preferable to choose a fitted equation rather than a spline fit. Let’s decide to choose the fitting from Gu. Now let’s look from closer how do the different phenophase extraction methods impact when applied to the same fitted curve in fig.9 (and note the use of update generic function with method phenopix)
print(fit4)
##
## #### phenopix time series processing ####
##
## FITTING: GU
##
## PREDICTED VALUES:
## Index predicted
## Min. : 92.0 Min. :0.3141
## 1st Qu.:152.5 1st Qu.:0.3231
## Median :213.0 Median :0.3888
## Mean :213.5 Mean :0.3687
## 3rd Qu.:274.5 3rd Qu.:0.4033
## Max. :336.0 Max. :0.4046
##
## FITTING EQUATION:
## expression(y0 + (a1/(1 + exp(-(t - t01)/b1))^c1) - (a2/(1 + exp(-(t -
## t02)/b2))^c2))
##
## FITTING PARAMETERS:
## y0 a1 a2 t01 t02
## 0.06777627 0.81096539 0.89974985 126.24936560 303.55806891
## b1 b2 c1 c2
## 3.59347356 1.27076400 19.21057048 0.05710031
##
## THRESHOLDS: GU
## UD SD DD RD maxline
## 131.99385793 143.09621465 280.49413430 304.18645354 0.87663946
## baseline prr psr plateau.slope
## -0.02100819 0.08085199 -0.03237116 -0.00120529
##
## UNCERTAINTY: FALSE
## N of replications = 0
##
## HYDROLOGICAL DAY OF YEAR: FALSE
fit4.trs <- update(fit4, 'trs', plot=FALSE)
fit4.klosterman <- update(fit4, 'klosterman', plot=FALSE)
fit4.gu <- update(fit4, 'gu', plot=FALSE)
par(mfrow=c(2,2), oma=rep(5,4,4,2), mar=rep(0,4))
plot(fit4.trs, type='n', main='', xaxt='n')
mtext('trs', 3, adj=0.1, line=-2)
plot(fit4.klosterman, type='n', main='', xaxt='n', yaxt='n')
mtext('klosterman', 3, adj=0.1, line=-2)
plot(0, type='n', axes=FALSE, xlab='', ylab='')
plot(fit4.gu, type='n', main='', yaxt='n')
axis(4)
mtext('gu', 3, adj=0.1, line=-2)
Figure 9: Three phenophase methods applied to the Gu fitting
The trs thresholds (50% of increasing and decreasing trajectory) hold a different meaning compared to Klosterman and Gu phenophases. The latter two show good correspondence except that the Klosterman s beginning of senescence occurs later compared to correspondent phase in Gu thresholds (i.e DD, downturn date).
We have shown 4 different approaches to mathematically describe the seasonal trajectory of greenness, with additionally 5 methods to extract phenophases on the obtained curves. The combination of curves and phenophase methods leads to as many as 20 possible approaches to describe a seasonal trajectory. Sometimes it could be useful to make a decision on which curves and phenophases to use, without computing the uncertainty on all of them. To do so we have designed two functions that provide a quick overview on what would be the best fit and phenophase method for your actual trajectory. Here is how to compute the 20 combinations of fit and uncertainty in a single function:
explored <- greenExplore(filtered.data$max.filtered)
## [1] "Fitting spline 1/5"
## [1] "Fitting Beck 2/5"
## [1] "Fitting Elmore 3/5"
## [1] "Fitting Klosterman 4/5"
## [1] "Fitting Gu 5/5"
explored is a list with 20 + 1 elements, i.e. the 20 combinations + a vector containing the RMSEs from each of the 4 fittings. This object will only be used as argument of the plotExplore() function (fig.10):
plotExplore(explored)
Figure 10: Overview of all combinations of curves and fits as obtained by the plotExplore function
The plot in fig.10 shows the impact of different fittings (moving up-downwards) and different phenophases (from left to right) on the same data (Harvard Forest). The RMSE (root-mean-square error) for each of the four fitting methods is also annotated in the first column. Lower error values represent smaller differences between the modeled value and the actual values (i.e. residuals) and therefore lower RMSE values represent better model fits. This plot might be useful to choose the most appropriate fitting and thresholding methods on your data.
greenProcess is a wrapper function that allows the user to define the fitting and phenophase methods as arguments. The “primitive” functions that actually perform the fits are the following: BeckFit, ElmoreFit, KlostermanFit and so on. Their usage is generally:
args(ElmoreFit)
## function (ts, uncert = FALSE, nrep = 100, ncores = "all", sf = quantile(ts,
## probs = c(0.05, 0.95), na.rm = TRUE))
## NULL
with the most important argument beeing ts, the time series. Compared to using greenProcess, the single fitting functions have the advantage to allow more flexibility but in general the user won’t need to use them.
The phenophase extraction methods also have a dedicated wrapper function already embedded in the greenProcess() function, PhenoExtract() which usage is:
args(PhenoExtract)
## function (data, method = "trs", uncert = FALSE, breaks = 3, envelope = "quantiles",
## quantiles = c(0.1, 0.9), plot = TRUE, sf, ...)
## NULL
where the argument method allows to choose the phenophase method. Note that input data in this case should be a fitted time series in output from e.g. FitDoubleLogElmore and not a phenopix object in output from greenProcess. Here is an example:
# Both method 1 and 2 should produce the same values
# Method 1
fit.elmore <- greenProcess(filtered.data$max.filtered, 'elmore', 'trs', plot=FALSE)
extract(fit.elmore, 'metrics')
## sos eos los pop mgs rsp
## NA 290.0000000 NA 145.0000000 0.3973447 NA
## rau peak msp mau
## NA 0.4150725 NA 0.3575157
# Method 2
fit.elmore.2 <- ElmoreFit(filtered.data$max.filtered)
PhenoExtract(fit.elmore.2, 'trs', plot=FALSE)
## $metrics
## sos eos los pop mgs rsp
## NA 290.0000000 NA 145.0000000 0.3973447 NA
## rau peak msp mau
## NA 0.4150725 NA 0.3575157
##
## $unc.df
## NULL
The uncertainty estimation
One main functionality of the package is the uncertainty estimation. This is performed in different ways depending on the fitting equation. The basic idea behind the uncertainty estimation is how good the smoothing curve fits to the data. The residuals between fitted and observed is therefore used to generate random noise to the data and fitting is applied recursively to randomly-noised original data. This procedure results in an ensemble of curves, curve parameters and extracted phenophases that represent the uncertainty estimate. The uncertainty on curve parameters is automatically propagated to phenophase extraction. In the following example the uncertainty estimation is performed on Harvard Forest data fitted with the approach of Gu et al. (2009), with 100 replications. Here is the code:
# Takes about 1 minute to compute:
fit.complete <- greenProcess(ts = filtered.data$max.filtered,
fit = 'gu',
threshold= 'gu',
plot = FALSE,
uncert = TRUE,
nrep = 100)
## [1] "estimated computation time (8 cores): 1 mins"
And here is fit.complete printed:
print(fit.complete)
##
## #### phenopix time series processing ####
##
## FITTING: GU
##
## PREDICTED VALUES:
## Index predicted
## Min. : 92.0 Min. :0.3141
## 1st Qu.:152.5 1st Qu.:0.3231
## Median :213.0 Median :0.3888
## Mean :213.5 Mean :0.3687
## 3rd Qu.:274.5 3rd Qu.:0.4033
## Max. :336.0 Max. :0.4046
##
## FITTING EQUATION:
## expression(y0 + (a1/(1 + exp(-(t - t01)/b1))^c1) - (a2/(1 + exp(-(t -
## t02)/b2))^c2))
##
## FITTING PARAMETERS:
## y0 a1 a2 t01 t02
## 0.06777627 0.81096539 0.89974985 126.24936560 303.55806891
## b1 b2 c1 c2
## 3.59347356 1.27076400 19.21057048 0.05710031
##
## THRESHOLDS: GU ENVELOPE:QUANTILES
## UD SD DD RD maxline baseline
## 10% -9.907452e+17 -1.827194e+19 198.0000 210.6268 0.8108180 -0.0188717161
## 50% 1.335436e+02 1.449789e+02 198.0000 211.5403 0.8155323 -0.0120058988
## 90% 6.032941e+17 2.946881e+19 285.6357 299.4923 0.8387973 -0.0002047714
## prr psr plateau.slope
## 10% 0.0000000 -0.06620943 -0.0008683386
## 50% 0.0000000 -0.06190401 -0.0006223380
## 90% 0.0851251 -0.05191868 0.0108112489
##
## UNCERTAINTY: TRUE
## N of replications = 100
##
## HYDROLOGICAL DAY OF YEAR: FALSE
As you can see from the output, the default behavior of greenProcess() for the computation of uncertainty is to provide the median, 10th and 90th percentile of the uncertainty ensemble. This may be changed by modifying the envelope argument. The other possible option is min-max to get minimum mean and maximum. In addition, the quantiles to be chosen with envelope = quantiles can be changed by modifying the quantile argument. Here is the example:
print(update(fit.complete, 'gu', envelope='min-max', plot = FALSE))
##
## #### phenopix time series processing ####
##
## FITTING: GU
##
## PREDICTED VALUES:
## Index predicted
## Min. : 92.0 Min. :0.3141
## 1st Qu.:152.5 1st Qu.:0.3231
## Median :213.0 Median :0.3888
## Mean :213.5 Mean :0.3687
## 3rd Qu.:274.5 3rd Qu.:0.4033
## Max. :336.0 Max. :0.4046
##
## FITTING EQUATION:
## expression(y0 + (a1/(1 + exp(-(t - t01)/b1))^c1) - (a2/(1 + exp(-(t -
## t02)/b2))^c2))
##
## FITTING PARAMETERS:
## y0 a1 a2 t01 t02
## 0.06777627 0.81096539 0.89974985 126.24936560 303.55806891
## b1 b2 c1 c2
## 3.59347356 1.27076400 19.21057048 0.05710031
##
## THRESHOLDS: GU ENVELOPE:MIN-MAX
## UD SD DD RD maxline baseline
## min -7.405566e+18 -2.009143e+20 197.0000 209.7083 0.8065911 -0.024507113
## mean -2.406225e+17 6.183363e+18 232.8477 246.3947 0.8223564 -0.010615201
## max 5.363112e+18 2.408786e+20 287.0987 304.2033 0.8579503 0.005880146
## prr psr plateau.slope
## min 0.00000000 -0.07056026 -0.001126931
## mean 0.03212879 -0.06006440 0.003896021
## max 0.09137311 -0.03240724 0.010921226
##
## UNCERTAINTY: TRUE
## N of replications = 100
##
## HYDROLOGICAL DAY OF YEAR:
Beside the few options available by default and described above, the uncertainty data.frame is accessible via the extract command, by running:
extract(fit.complete, 'metrics.uncert') ## get threshold uncertainty data`
extract(fit.complete, 'params.uncert') ## get parameters of each fitting curve`
For example, if you want to use phenophases extracted from the true model and construct uncertainty envelope on them, you can access the uncertainty data.frame by the commands given above. Note than when the uncertainty is computed, also the plot function changes its behavior, in that it also shows the uncertainty curve ensemble and an error bar on extracted phases (fig.10).
plot(fit.complete, type='p', pch=20)
Figure 11: The Uncertainty Estimation (100 rep) on Klosterman fit and Gu phenophases
The distribution of uncertainty parameters (phenophases + curve parameters) can also be evaluated by means of box-plots with an extra option to the default plot method:
plot(fit.complete, what='thresholds')


By using the update function you can also extract phenophases according to a different method, without refitting the data. Here is the code:
update(fit.complete, 'klosterman', plot=FALSE)
##
## #### phenopix time series processing ####
##
## FITTING: GU
##
## PREDICTED VALUES:
## Index predicted
## Min. : 92.0 Min. :0.3141
## 1st Qu.:152.5 1st Qu.:0.3231
## Median :213.0 Median :0.3888
## Mean :213.5 Mean :0.3687
## 3rd Qu.:274.5 3rd Qu.:0.4033
## Max. :336.0 Max. :0.4046
##
## FITTING EQUATION:
## expression(y0 + (a1/(1 + exp(-(t - t01)/b1))^c1) - (a2/(1 + exp(-(t -
## t02)/b2))^c2))
##
## FITTING PARAMETERS:
## y0 a1 a2 t01 t02
## 0.06777627 0.81096539 0.89974985 126.24936560 303.55806891
## b1 b2 c1 c2
## 3.59347356 1.27076400 19.21057048 0.05710031
##
## THRESHOLDS: KLOSTERMAN ENVELOPE:QUANTILES
## Greenup Maturity Senescence Dormancy
## 10% 126 149 281 303
## 50% 128 150 281 305
## 90% 129 150 282 306
##
## UNCERTAINTY: TRUE
## N of replications = 100
##
## HYDROLOGICAL DAY OF YEAR:
Phenophase extraction method trs allows to set an extra argument that controls which threshold in the trajectory be used. Default is when 50% of seasonal maximum gcc is reached (indicated as 0.5). Let’s see how it works:
extract(update(fit.complete,
'trs',
trs = 0.5,
plot=FALSE),
'metrics') # default to 50% of increasing
## sos eos los pop mgs rsp rau peak msp mau
## 10% 137 290 151 148.0 0.3956421 NA NA 0.3979850 0.3576441 0.3542848
## 50% 138 290 152 210.0 0.3961202 NA NA 0.3984602 0.3603773 0.3590862
## 90% 138 290 153 211.5 0.3964265 NA NA 0.4008053 0.3616334 0.3600816
extract(update(fit.complete,
'trs',
trs=0.2,
plot=FALSE),
'metrics') # changed to 20%
## sos eos los pop mgs rsp rau peak msp mau
## 10% 127.9 296 163.9 148.0 0.3912752 NA NA 0.3979850 0.3312207 0.3373970
## 50% 128.0 296 168.0 210.0 0.3916978 NA NA 0.3984602 0.3323248 0.3385821
## 90% 132.1 296 169.0 211.5 0.3927517 NA NA 0.4008053 0.3418718 0.3394631
There is a last method to define thresholds on a time series that does not need a fitting. It implements the use of break points from the package strucchange and works as follows:
# uses `strucchange` package
par(mfrow = c(1,1))
print(phenopix::PhenoBP(x = filtered.data$max.filtered, breaks = 4, plot = TRUE, confidence= 0.99))

## bp1 bp2 bp3
## 0.5% 1463724000 1472191200 1476597600
## mean 1463810400 1473746400 1476856800
## 99.5% 1463896800 1473832800 1476943200
The user can set the maximum number of breakpoints to be identified, the confidence interval at which the calculation must be performed and an option or a plot. The output dataframe contains the day of the year for each of the breakpoints and their respective confidence intervals.
Pushing forward the analysis: pixel - based phenology
In order to thoroughly exploit the capabilities of an imagery archive, spatial analysis represents the most promising feature. Hence, specific functions are built to fit curves and extract phenophases on each pixel included in a region of interest instead of averaging the greenness index over the entire ROI. A specific vignette of this package is devoted to explain details on the pixel-based analysis.
12 Other functions
A number of other functions are available in the package, that do not necessarily enter the main workflow of the processing but still may be worth to mention.
plotVI() gets in input a VI.data data.frame as produced by extractVIs and reproduces the default plots from extractVIs. Useful when you use extractVIs with argument begin switched on and you want to update existing plots. hydrodoy to convert from and to hydrological day of year, to be used in conjuction with greenProcess with hydro=TRUE
—– From Filippa et al. (2016) - Agr. and Forest Meteorology —–
Main functions
The typical work-flow of the phenopix package is summarised in the flowchart shown in Fig. 2. First, one (or more) region(s) of interest (ROI) is (are) chosen, then digital colour numbers are extracted from the ROI of each image, and processed to obtain a seasonal time series. After filtering the time series, data are fitted with either a double logistic equation or a smoothing curve, on which phenological thresholds (phenophases) are extracted. Finally, uncertainty of the fit and of phenophases can be computed.
Regions of interest (ROIs)
The scene of the picture rarely includes only the targeted vegetation canopy, thus the user will want to choose a particular region within the scene for analysis. Even more often, more than one region may be of interest, for example in a mixed forest one might independently analyse different deciduous species and evergreen trees (e.g. Ahrends et al., 2008). The function DrawROI() allows the user to draw one or more regions of interest on-screen, using the mouse cursor on a chosen reference picture.
The arguments for the DrawROI() function are explained in the help package. However, a little clarity may be helpful. DrawROI(path_img_ref, path_ROIs, nroi = 1, roi.names, file.type=‘.jpg’) * path_img_ref refers to the name of the image that you want to explore. You do not need to add full path if the called image is in the working directory.
* path_ROIs refers to the folder where you want to save the ROI.
* nroi refers to the number of ROIs that you want to define in the image.
* roi.names refers to the name that you want to call your new ROI
DrawROI("nwt-bowman.jpg", path_ROIs = "/Users/elwoodk/Google Drive/ElwoodK_Research/Computation/Phenopix/ROIs", roi.names = "roi.test2", nroi = 1, file.type = ".jpg") # Click on vertices. Use 'Esc' to finish.
DrawROI(path_img_ref = "nwt-sena.jpg", path_ROIs = "/Users/elwoodk/Google Drive/ElwoodK_Research/Computation/Phenopix/ROIs/roi.test")